AAAI.2016 - NLP and Text Mining | Cool Papers

#1 Extracting Topical Phrases from Clinical Documents [PDF] [Copy] [Kimi]

Author: Yulan He

In clinical documents, medical terms are often expressed in multi-word phrases. Traditional topic modelling approaches relying on the "bag-of-words" assumption are not effective in extracting topic themes from clinical documents. This paper proposes to first extract medical phrases using an off-the-shelf tool for medical concept mention extraction, and then train a topic model which takes a hierarchy of Pitman-Yor processes as prior for modelling the generation of phrases of arbitrary length. Experimental results on patients' discharge summaries show that the proposed approach outperforms the state-of-the-art topical phrase extraction model on both perplexity and topic coherence measure and finds more interpretable topics.

#2 Discourse Relations Detection via a Mixed Generative-Discriminative Framework [PDF] [Copy] [Kimi]

Authors: Jifan Chen ; Qi Zhang ; Pengfei Liu ; Xuanjing Huang

Word embeddings, which can better capture the fine-grained semantics of words, have proven to be useful for a variety of natural language processing tasks. However, because discourse structures describe the relationships between segments of discourse, word embeddings cannot be directly integrated to perform the task. In this paper, we introduce a mixed generative-discriminative framework, in which we use vector offsets between embeddings of words to represent the semantic relations between text segments and Fisher kernel framework to convert a variable number of vector offsets into a fixed length vector. In order to incorporate the weights of these offsets into the vector, we also propose the Weighted Fisher Vector. Experimental results on two different datasets show that the proposed method without using manually designed features can achieve better performance on recognizing the discourse level relations in most cases.

#3 Improving Recommendation of Tail Tags for Questions in Community Question Answering [PDF] [Copy] [Kimi]

Authors: Yu Wu ; Wei Wu ; Zhoujun Li ; Ming Zhou

We study tag recommendation for questions in community question answering (CQA). Tags represent the semantic summarization of questions are useful for navigation and expert finding in CQA and can facilitate content consumption such as searching and mining in these web sites. The task is challenging, as both questions and tags are short and a large fraction of tags are tail tags which occur very infrequently. To solve these problems, we propose matching questions and tags not only by themselves, but also by similar questions and similar tags. The idea is then formalized as a model in which we calculate question-tag similarity using a linear combination of similarity with similar questions and tags weighted by tag importance.Question similarity, tag similarity, and tag importance are learned in a supervised random walk framework by fusing multiple features. Our model thus can not only accurately identify question-tag similarity for head tags, but also improve the accuracy of recommendation of tail tags. Experimental results show that the proposed method significantly outperforms state-of-the-art methods on tag recommendation for questions. Particularly, it improves tail tag recommendation accuracy by a large margin.

#4 Distant IE by Bootstrapping Using Lists and Document Structure [PDF] [Copy] [Kimi]

Authors: Lidong Bing ; Mingyang Ling ; Richard Wang ; William Cohen

Distant labeling for information extraction (IE) suffers from noisy training data. We describe a way of reducing the noise associated with distant IE by identifying coupling constraints between potential instance labels. As one example of coupling,items in a list are likely to have the same label.A second example of coupling comes from analysis of document structure: in some corpora,sections can be identified such that items in the same section are likely to have the same label. Such sections do not exist in all corpora, but we show that augmenting a large corpus with coupling constraints from even a small, well-structured corpus can improve performance substantially, doubling F1 on one task.

#5 Joint Word Segmentation, POS-Tagging and Syntactic Chunking [PDF] [Copy] [Kimi]

Authors: Chen Lyu ; Yue Zhang ; Donghong Ji

Chinese chunking has traditionally been solved by assuming gold standard word segmentation.We find that the accuracies drop drastically when automatic segmentation is used.Inspired by the fact that chunking knowledge can potentially improve segmentation, we explore a joint model that performs segmentation, POS-tagging and chunking simultaneously.In addition, to address the sparsity of full chunk features, we employ a semi-supervised method to derive chunk cluster features from large-scale automatically-chunked data.Results show the effectiveness of the joint model with semi-supervised features.

#6 Improving Twitter Sentiment Classification Using Topic-Enriched Multi-Prototype Word Embeddings [PDF] [Copy] [Kimi]

Authors: Yafeng Ren ; Yue Zhang ; Meishan Zhang ; Donghong Ji

It has been shown that learning distributed word representations is highly useful for Twitter sentiment classification.Most existing models rely on a single distributed representation for each word.This is problematic for sentiment classification because words are often polysemous and each word can contain different sentiment polarities under different topics.We address this issue by learning topic-enriched multi-prototype word embeddings (TMWE).In particular, we develop two neural networks which 1) learn word embeddings that better capture tweet context by incorporating topic information, and 2) learn topic-enriched multiple prototype embeddings for each word.Experiments on Twitter sentiment benchmark datasets in SemEval 2013 show that TMWE outperforms the top system with hand-crafted features, and the current best neural network model.

#7 Temporal Topic Analysis with Endogenous and Exogenous Processes [PDF] [Copy] [Kimi]

Authors: Baiyang Wang ; Diego Klabjan

We consider the problem of modeling temporal textual data taking endogenous and exogenous processes into account. Such text documents arise in real world applications, including job advertisements and economic news articles, which are influenced by the fluctuations of the general economy. We propose a hierarchical Bayesian topic model which imposes a "group-correlated" hierarchical structure on the evolution of topics over time incorporating both processes, and show that this model can be estimated from Markov chain Monte Carlo sampling methods. We further demonstrate that this model captures the intrinsic relationships between the topic distribution and the time-dependent factors, and compare its performance with latent Dirichlet allocation (LDA) and two other related models. The model is applied to two collections of documents to illustrate its empirical performance: online job advertisements from DirectEmployers Association and journalists' postings on BusinessInsider.com.

#8 Age of Exposure: A Model of Word Learning [PDF] [Copy] [Kimi]

Authors: Mihai Dascalu ; Danielle McNamara ; Scott Crossley ; Stefan Trausan-Matu

Textual complexity is widely used to assess the difficulty of reading materials and writing quality in student essays. At a lexical level, word complexity can represent a building block for creating a comprehensive model of lexical networks that adequately estimates learners’ understanding. In order to best capture how lexical associations are created between related concepts, we propose automated indices of word complexity based on Age of Exposure (AoE). AOE indices computationally model the lexical learning process as a function of a learner's experience with language. This study describes a proof of concept based on the on a large-scale learning corpus (i.e., TASA). The results indicate that AoE indices yield strong associations with human ratings of age of acquisition, word frequency, entropy, and human lexical response latencies providing evidence of convergent validity.

#9 Improving Opinion Aspect Extraction Using Semantic Similarity and Aspect Associations [PDF] [Copy] [Kimi]

Authors: Qian Liu ; Bing Liu ; Yuanlin Zhang ; Doo Soon Kim ; Zhiqiang Gao

Aspect extraction is a key task of fine-grained opinion mining. Although it has been studied by many researchers, it remains to be highly challenging. This paper proposes a novel unsupervised approach to make a major improvement. The approach is based on the framework of lifelong learning and is implemented with two forms of recommendations that are based on semantic similarity and aspect associations respectively. Experimental results using eight review datasets show the effectiveness of the proposed approach.

#10 Collective Supervision of Topic Models for Predicting Surveys with Social Media [PDF] [Copy] [Kimi]

Authors: Adrian Benton ; Michael Paul ; Braden Hancock ; Mark Dredze

This paper considers survey prediction from social media. We use topic models to correlate social media messages with survey outcomes and to provide an interpretable representation of the data. Rather than rely on fully unsupervised topic models, we use existing aggregated survey data to inform the inferred topics, a class of topic model supervision referred to as collective supervision. We introduce and explore a variety of topic model variants and provide an empirical analysis, with conclusions of the most effective models for this task.

#11 A Probabilistic Soft Logic Based Approach to Exploiting Latent and Global Information in Event Classification [PDF] [Copy] [Kimi]

Authors: Shulin Liu ; Kang Liu ; Shizhu He ; Jun Zhao

Global information such as event-event association, and latent local information such as fine-grained entity types, are crucial to event classification. However, existing methods typically focus on sophisticated local features such as part-of-speech tags, either fully or partially ignoring the aforementioned information. By contrast, this paper focuses on fully employing them for event classification. We notice that it is difficult to encode some global information such as event-event association for previous methods. To resolve this problem, we propose a feasible approach which encodes global information in the form of logic using Probabilistic Soft Logic model. Experimental results show that, our proposed approach advances state-of-the-art methods, and achieves the best F1 score to date on the ACE data set.

#12 TGSum: Build Tweet Guided Multi-Document Summarization Dataset [PDF] [Copy] [Kimi]

Authors: Ziqiang Cao ; Chengyao Chen ; Wenjie Li ; Sujian Li ; Furu Wei ; Ming Zhou

The development of summarization research has been significantly hampered by the costly acquisition of reference summaries. This paper proposes an effective way to automatically collect large scales of news-related multi-document summaries with reference to social media's reactions. We utilize two types of social labels in tweets, i.e., hashtags and hyper-links. Hashtags are used to cluster documents into different topic sets. Also, a tweet with a hyper-link often highlights certain key points of the corresponding document. We synthesize a linked document cluster to form a reference summary which can cover most key points. To this aim, we adopt the ROUGE metrics to measure the coverage ratio, and develop an Integer Linear Programming solution to discover the sentence set reaching the upper bound of ROUGE. Since we allow summary sentences to be selected from both documents and high-quality tweets, the generated reference summaries could be abstractive. Both informativeness and readability of the collected summaries are verified by manual judgment. In addition, we train a Support Vector Regression summarizer on DUC generic multi-document summarization benchmarks. With the collected data as extra training resource, the performance of the summarizer improves a lot on all the test sets. We release this dataset for further research.

#13 Global Distant Supervision for Relation Extraction [PDF] [Copy] [Kimi]

Authors: Xianpei Han ; Le Sun

Machine learning approaches to relation extraction are typically supervised and require expensive labeled data. To break the bottleneck of labeled data, a promising approach is to exploit easily obtained indirect supervision knowledge – which we usually refer to as distant supervision (DS). However, traditional DS methods mostly only exploit one specific kind of indirect supervision knowledge – the relations/facts in a given knowledge base, thus often suffer from the problem of lack of supervision. In this paper, we propose a global distant supervision model for relation extraction, which can: 1) compensate the lack of supervision with a wide variety of indirect supervision knowledge; and 2) reduce the uncertainty in DS by performing joint inference across relation instances. Experimental results show that, by exploiting the consistency between relation labels, the consistency between relations and arguments, and the consistency between neighbor instances using Markov logic, our method significantly outperforms traditional DS approaches.

#14 Personalized Microblog Sentiment Classification via Multi-Task Learning [PDF] [Copy] [Kimi]

Authors: Fangzhao Wu ; Yongfeng Huang

Microblog sentiment classification is an interesting and important research topic with wide applications. Traditional microblog sentiment classification methods usually use a single model to classify the messages from different users and omit individuality. However, microblogging users frequently embed their personal character, opinion bias and language habits into their messages, and the same word may convey different sentiments in messages posted by different users. In this paper, we propose a personalized approach for microblog sentiment classification. In our approach, each user has a personalized sentiment classifier, which is decomposed into two components, a global one and a user-specific one. Our approach can capture the individual personality and at the same time leverage the common sentiment knowledge shared by all users. The personalized sentiment classifiers of massive users are trained in a collaborative way based on multi-task learning to handle the data sparseness problem. In addition, we incorporate users' social relations into our model to strengthen the learning of the personalized models. Moreover, we propose a distributed optimization algorithm to solve our model in parallel. Experiments on two real-world microblog sentiment datasets validate that our approach can improve microblog sentiment classification accuracy effectively and efficiently.

#15 Aggregating Inter-Sentence Information to Enhance Relation Extraction [PDF] [Copy] [Kimi]

Authors: Hao Zheng ; Zhoujun Li ; Senzhang Wang ; Zhao Yan ; Jianshe Zhou

Previous work for relation extraction from free text is mainly based on intra-sentence information. As relations might be mentioned across sentences, inter-sentence information can be leveraged to improve distantly supervised relation extraction. To effectively exploit inter-sentence information, we propose a ranking based approach, which first learns a scoring function based on a listwise learning-to-rank model and then uses it for multi-label relation extraction. Experimental results verify the effectiveness of our method for aggregating information across sentences. Additionally, to further improve the ranking of high-quality extractions, we propose an effective method to rank relations from different entity pairs. This method can be easily integrated into our overall relation extraction framework, and boosts the precision significantly.

#16 Gated Neural Networks for Targeted Sentiment Analysis [PDF] [Copy] [Kimi]

Authors: Meishan Zhang ; Yue Zhang ; Duy-Tin Vo

Targeted sentiment analysis classifies the sentiment polarity towards each target entity mention in given text documents. Seminal methods extract manual discrete features from automatic syntactic parse trees in order to capture semantic information of the enclosing sentence with respect to a target entity mention. Recently, it has been shown that competitive accuracies can be achieved without using syntactic parsers, which can be highly inaccurate on noisy text such as tweets. This is achieved by applying distributed word representations and rich neural pooling functions over a simple and intuitive segmentation of tweets according to target entity mentions. In this paper, we extend this idea by proposing a sentence-level neural model to address the limitation of pooling functions, which do not explicitly model tweet-level semantics. First, a bi-directional gated neural network is used to connect the words in a tweet so that pooling functions can be applied over the hidden layer instead of words for better representing the target and its contexts. Second, a three-way gated neural network structure is used to model the interaction between the target mention and its surrounding contexts. Experiments show that our proposed model gives significantly higher accuracies compared to the current best method for targeted sentiment analysis.

#17 A Joint Model for Question Answering over Multiple Knowledge Bases [PDF] [Copy] [Kimi]

Authors: Yuanzhe Zhang ; Shizhu He ; Kang Liu ; Jun Zhao

As the amount of knowledge bases (KBs) grows rapidly, the problem of question answering (QA) over multiple KBs has drawn more attention. The most significant distinction between multiple KB-QA and single KB-QA is that the former must consider the alignments between KBs. The pipeline strategy first constructs the alignments independently, and then uses the obtained alignments to construct queries. However, alignment construction is not a trivial task, and the introduced noises would be passed on to query construction. By contrast, we notice that alignment construction and query construction are interactive steps, and jointly considering them would be beneficial. To this end, we present a novel joint model based on integer linear programming (ILP), uniting these two procedures into a uniform framework. The experimental results demonstrate that the proposed approach outperforms state-of-the-art systems, and is able to improve the performance of both alignment construction and query construction.

#18 News Verification by Exploiting Conflicting Social Viewpoints in Microblogs [PDF] [Copy] [Kimi]

Authors: Zhiwei Jin ; Juan Cao ; Yongdong Zhang ; Jiebo Luo

Fake news spreading in social media severely jeopardizes the veracity of online content. Fortunately, with the interactive and open features of microblogs, skeptical and opposing voices against fake news always arise along with it. The conflicting information, ignored by existing studies, is crucial for news verification. In this paper, we take advantage of this "wisdom of crowds" information to improve news verification by mining conflicting viewpoints in microblogs. First, we discover conflicting viewpoints in news tweets with a topic model method. Based on identified tweets' viewpoints, we then build a credibility propagation network of tweets linked with supporting or opposing relations. Finally, with iterative deduction, the credibility propagation on the network generates the final evaluation result for news. Experiments conducted on a real-world data set show that the news verification performance of our approach significantly outperforms those of the baseline approaches.

#19 Reading the Videos: Temporal Labeling for Crowdsourced Time-Sync Videos Based on Semantic Embedding [PDF] [Copy] [Kimi]

Authors: Guangyi Lv ; Tong Xu ; Enhong Chen ; Qi Liu ; Yi Zheng

Recent years have witnessed the boom of online sharing media contents, which raise significant challenges in effective management and retrieval. Though a large amount of efforts have been made, precise retrieval on video shots with certain topics has been largely ignored. At the same time, due to the popularity of novel time-sync comments, or so-called "bullet-screen comments", video semantics could be now combined with timestamps to support further research on temporal video labeling. In this paper, we propose a novel video understanding framework to assign temporal labels on highlighted video shots. To be specific, due to the informal expression of bullet-screen comments, we first propose a temporal deep structured semantic model (T-DSSM) to represent comments into semantic vectors by taking advantage of their temporal correlation. Then, video highlights are recognized and labeled via semantic vectors in a supervised way. Extensive experiments on a real-world dataset prove that our framework could effectively label video highlights with a significant margin compared with baselines, which clearly validates the potential of our framework on video understanding, as well as bullet-screen comments interpretation.

#20 Argument Mining from Speech: Detecting Claims in Political Debates [PDF] [Copy] [Kimi]

Authors: Marco Lippi ; Paolo Torroni

The automatic extraction of arguments from text, also known as argument mining, has recently become a hot topic in artificial intelligence. Current research has only focused on linguistic analysis. However, in many domains where communication may be also vocal or visual, paralinguistic features too may contribute to the transmission of the message that arguments intend to convey. For example, in political debates a crucial role is played by speech. The research question we address in this work is whether in such domains one can improve claim detection for argument mining, by employing features from text and speech in combination. To explore this hypothesis, we develop a machine learning classifier and train it on an original dataset based on the 2015 UK political elections debate.

#21 A Joint Model for Entity Set Expansion and Attribute Extraction from Web Search Queries [PDF] [Copy] [Kimi]

Authors: Zhenzhong Zhang ; Le Sun ; Xianpei Han

Entity Set Expansion (ESE) and Attribute Extraction (AE) are usually treated as two separate tasks in Information Extraction (IE). However, the two tasks are tightly coupled, and each task can benefit significantly from the other by leveraging the inherent relationship between entities and attributes. That is, 1) an attribute is important if it is shared by many typical entities of a class; 2) an entity is typical if it owns many important attributes of a class. Based on this observation, we propose a joint model for ESE and AE, which models the inherent relationship between entities and attributes as a graph. Then a graph reinforcement algorithm is proposed to jointly mine entities and attributes of a specific class. Experimental results demonstrate the superiority of our method for discovering both new entities and new attributes.

#22 To Swap or Not to Swap? Exploiting Dependency Word Pairs for Reordering in Statistical Machine Translation [PDF] [Copy] [Kimi]

Authors: Christian Hadiwinoto ; Yang Liu ; Hwee Tou Ng

Reordering poses a major challenge in machine translation (MT) between two languages with significant differences in word order. In this paper, we present a novel reordering approach utilizing sparse features based on dependency word pairs. Each instance of these features captures whether two words, which are related by a dependency link in the source sentence dependency parse tree, follow the same order or are swapped in the translation output. Experiments on Chinese-to-English translation show a statistically significant improvement of 1.21 BLEU point using our approach, compared to a state-of-the-art statistical MT system that incorporates prior reordering approaches.

#23 Identifying Search Keywords for Finding Relevant Social Media Posts [PDF] [Copy] [Kimi]

Authors: Shuai Wang ; Zhiyuan Chen ; Bing Liu ; Sherry Emery

In almost any application of social media analysis, the user is interested in studying a particular topic or research question. Collecting posts or messages relevant to the topic from a social media source is a necessary step. Due to the huge size of social media sources (e.g., Twitter and Facebook), one has to use some topic keywords to search for possibly relevant posts. However, gathering a good set of keywords is a very tedious and time-consuming task. It often involves a lengthy iterative process of searching and manual reading. In this paper, we propose a novel technique to help the user identify topical search keywords. Our experiments are carried out on identifying such keywords for five (5) real-life application topics to be used for searching relevant tweets from the Twitter API. The results show that the proposed method is highly effective.

#24 A Semi-Supervised Learning Approach to Why-Question Answering [PDF] [Copy] [Kimi]

Authors: Jong-Hoon Oh ; Kentaro Torisawa ; Chikara Hashimoto ; Ryu Iida ; Masahiro Tanaka ; Julien Kloetzer

We propose a semi-supervised learning method for improving why-question answering (why-QA). The key of our method is to generate training data (question-answer pairs) from causal relations in texts such as "[Tsunamis are generated](effect) because [the ocean's water mass is displaced by an earthquake](cause)." A naive method for the generation would be to make a question-answer pair by simply converting the effect part of the causal relations into a why-question, like "Why are tsunamis generated?" from the above example, and using the source text of the causal relations as an answer. However, in our preliminary experiments, this naive method actually failed to improve the why-QA performance. The main reason was that the machine-generated questions were often incomprehensible like "Why does (it) happen?", and that the system suffered from overfitting to the results of our automatic causality recognizer. Hence, we developed a novel method that effectively filters out incomprehensible questions and retrieves from texts answers that are likely to be paraphrases of a given causal relation. Through a series of experiments, we showed that our approach significantly improved the precision of the top answer by 8% over the current state-of-the-art system for Japanese why-QA.

#25 Tweet Timeline Generation with Determinantal Point Processes [PDF] [Copy] [Kimi]

Authors: Jin-ge Yao ; Feifan Fan ; Wayne Xin Zhao ; Xiaojun Wan ; Edward Chang ; Jianguo Xiao

The task of tweet timeline generation (TTG) aims at selecting a small set of representative tweets to generate a meaningful timeline and providing enough coverage for a given topical query. This paper presents an approach based on determinantal point processes (DPPs) by jointly modeling the topical relevance of each selected tweet and overall selectional diversity. Aiming at better treatment for balancing relevance and diversity, we introduce two novel strategies, namely spectral rescaling and topical prior. Extensive experiments on the public TREC 2014 dataset demonstrate that our proposed DPP model along with the two strategies can achieve fairly competitive results against the state-of-the-art TTG systems.